{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 准备工作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "公众号 = \"人人都是产品经理\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 输出 \n",
    "fn = { \"output\" : { \"公众号_htm_snippets\": \"data_raw_src/公众号_htm_snippets_{公众号}.xlsx\",\n",
    "                    \"公众号_df\": \"data_raw_src/公众号_df_{公众号}.tsv\",\n",
    "                    \"公众号_xlsx\": \"data_sets/公众号_url_{公众号}.xlsx\" } \\\n",
    "      }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 采集公众号（requests）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "1、登录微信公众号平台\n",
    "\n",
    "2、选择管理的素材管理\n",
    "\n",
    "3、点击新建图文消息\n",
    "\n",
    "4、点击超链接\n",
    "\n",
    "5、右键检查，network，XHR，在刷新页面\n",
    "\n",
    "6、选择其他公众号，并输入想要爬取的公众号\n",
    "\n",
    "7、点击XHR的appmsg?action=list_ex,便可以获取cookie、user-agent等信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "5\n",
      "10\n",
      "15\n",
      "20\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>link</th>\n",
       "      <th>create_time</th>\n",
       "      <th>格式化时间</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>这是产品新人真正需要的一次项目评审会，全网直播</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1589555908</td>\n",
       "      <td>2020-05-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>一个有趣的404页面，是如何体现的？</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1589555908</td>\n",
       "      <td>2020-05-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6000字，告诉你社群运营怎么做</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1589555908</td>\n",
       "      <td>2020-05-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>公开课 | 我0基础，距离大厂还有多远？</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1589555908</td>\n",
       "      <td>2020-05-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B端产品，四招搞定老板不靠谱需求</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1589555908</td>\n",
       "      <td>2020-05-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>119</th>\n",
       "      <td>大家都在说的「商业模式」，到底是什么？</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1587475277</td>\n",
       "      <td>2020-04-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>有效提升文案能力，还能做文案副业的方法，在这里！</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1587475277</td>\n",
       "      <td>2020-04-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>企业自媒体运营完全手册：抖音、微信、微博、知乎、小红书 6 大平台怎么做？</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1587475277</td>\n",
       "      <td>2020-04-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>限时免费 | 8年经验产品导师，4招教你解锁转岗产品正确姿势！</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1587475277</td>\n",
       "      <td>2020-04-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123</th>\n",
       "      <td>微信灰度测试：订阅号放弃时间排序了？</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>1587475277</td>\n",
       "      <td>2020-04-21</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>124 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     title  \\\n",
       "0                  这是产品新人真正需要的一次项目评审会，全网直播   \n",
       "1                       一个有趣的404页面，是如何体现的？   \n",
       "2                         6000字，告诉你社群运营怎么做   \n",
       "3                     公开课 | 我0基础，距离大厂还有多远？   \n",
       "4                         B端产品，四招搞定老板不靠谱需求   \n",
       "..                                     ...   \n",
       "119                    大家都在说的「商业模式」，到底是什么？   \n",
       "120               有效提升文案能力，还能做文案副业的方法，在这里！   \n",
       "121  企业自媒体运营完全手册：抖音、微信、微博、知乎、小红书 6 大平台怎么做？   \n",
       "122        限时免费 | 8年经验产品导师，4招教你解锁转岗产品正确姿势！   \n",
       "123                     微信灰度测试：订阅号放弃时间排序了？   \n",
       "\n",
       "                                                  link  create_time  \\\n",
       "0    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1589555908   \n",
       "1    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1589555908   \n",
       "2    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1589555908   \n",
       "3    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1589555908   \n",
       "4    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1589555908   \n",
       "..                                                 ...          ...   \n",
       "119  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1587475277   \n",
       "120  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1587475277   \n",
       "121  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1587475277   \n",
       "122  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1587475277   \n",
       "123  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...   1587475277   \n",
       "\n",
       "          格式化时间  \n",
       "0    2020-05-15  \n",
       "1    2020-05-15  \n",
       "2    2020-05-15  \n",
       "3    2020-05-15  \n",
       "4    2020-05-15  \n",
       "..          ...  \n",
       "119  2020-04-21  \n",
       "120  2020-04-21  \n",
       "121  2020-04-21  \n",
       "122  2020-04-21  \n",
       "123  2020-04-21  \n",
       "\n",
       "[124 rows x 4 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 目标url\n",
    "\n",
    "import time\n",
    "import requests\n",
    "import pandas as pd\n",
    "import csv\n",
    "\n",
    "\n",
    "url = \"https://mp.weixin.qq.com/cgi-bin/appmsg\"\n",
    "\n",
    "# 使用Cookie，跳过登陆操作\n",
    "headers = {\n",
    "  \"Cookie\":\"RK=7a54SqJAEm; ptcz=184bc2598b9b86c91826ecb600133e1a6123fa8fcf6cddd67cb09545ee7a40ab; pgv_pvi=4352528384; pgv_pvid=3243063360; tvfe_boss_uuid=9550c4b7ec3c201f; ua_id=K1zuXYK3fRqy8aC2AAAAABxj_Vrc6YeyBJPr-ZqgAQE=; mm_lang=zh_CN; wxuin=63093786147366; o_cookie=424362655; pac_uid=1_424362655; ied_qq=o0424362655; openid2ticket_oj2iB0VkHB0f9mUlGU95ZeZbjySs=Z4wlcFP5qOjOh8bCCRwgKMLRWYTjwQxHbUwFKFVNOs0=; pgv_si=s5864681472; uuid=5b12ef4e05eda6bb5688a7dadda2c2f0; rand_info=CAESIG2qwTpLk69FSdz1bSomHVu/y5qn9/HsXIcvyIbXaeDv; slave_bizuin=3514919055; data_bizuin=3514919055; bizuin=3514919055; data_ticket=/DgajlVc4dNPbGFMIYT2lf+pQrn2Efzsv0tw2Xb0iW//cb4RMd0tWCOKQpWllBMP; slave_sid=VW1WQ3I1ZW11WlJMVGUzM1VKR21mS0dTZGJWc3NiUTJnWjBXQ2pTV0FBRk1mYVBNWjVhdFBlZ1lXTFhpZkk3azN4UW1Db3IwTWRnZlEyb2lYRWJsd0cwdExjaG1ranJtUjZfN2tjU1BrZG5wQXFEbFZKTFNVRUNucm9NQTFMT3R1NDZKakY1eVBVNkcyT1hw; slave_user=gh_a72a331ddcfd; xid=0b6ae4dfc17b3cd5f606698ba391810d\",\n",
    "    \"User-Agent\": \"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36\"\n",
    "}\n",
    "\n",
    "data = {\n",
    "    \"token\": \"1734233808\",\n",
    "    \"lang\": \"zh_CN\",\n",
    "    \"f\": \"json\",\n",
    "    \"ajax\": \"1\",\n",
    "    \"action\": \"list_ex\",\n",
    "    \"begin\": \"0\",\n",
    "    \"count\": \"5\",\n",
    "    \"query\": \"\",\n",
    "    \"fakeid\": \"MjM5OTEwNjI2MA==\", # faceid 对应公众号名称\n",
    "    \"type\": \"9\",\n",
    "}\n",
    "\n",
    "\n",
    "\n",
    "content_list=[]\n",
    "\n",
    "for i in range(5):\n",
    "    data[\"begin\"] = i*5\n",
    "    print(data[\"begin\"])\n",
    "    time.sleep(3)\n",
    "    # 使用get方法进行提交\n",
    "    content_json = requests.get(url, headers=headers, params=data).json()\n",
    "#     print(content_json)\n",
    "    # 返回了一个json，里面是每一页的数据\n",
    "    for item in content_json[\"app_msg_list\"]:\n",
    "    # 提取每页文章的标题及对应的url\n",
    "        items = []\n",
    "        items.append(item[\"title\"])\n",
    "        items.append(item[\"link\"])\n",
    "        items.append(item[\"create_time\"])\n",
    "        content_list.append(items)\n",
    "\n",
    "\n",
    "\n",
    "name=['title','link','create_time']\n",
    "test=pd.DataFrame(columns=name,data=content_list)\n",
    "# 添加格式化时间\n",
    "test = test.assign(\n",
    "    格式化时间 = lambda x: pd.to_datetime(x[\"create_time\"], unit='s').dt.strftime('%Y-%m-%d')\n",
    ")\n",
    "display(test)\n",
    "\n",
    "with pd.ExcelWriter(fn[\"output\"][\"公众号_xlsx\"].format(公众号=\"人人都是产品经理_requests\")) as writer:\n",
    "    test.to_excel(writer)\n",
    "\n",
    "# test.to_csv(\"../微信公众号爬虫_zhichao/南方周末.csv\",mode='a',encoding='utf-8')\n",
    "# print(\"保存成功\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 采集公众号（selenium）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from lxml.html import fromstring\n",
    "import time\n",
    "from random import random\n",
    "\n",
    "# when selenium main_content is used\n",
    "# Parses an HTML document from a string constant.  Returns the root nood\n",
    "# root = fromstring(df.loc[1,\"html_snippets\"]) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用Selenium\n",
    "* 要更改 opts.binary_location 至自己本地的Chrome浏览器，建议portable\n",
    "* Chrome浏览器 和 chromedriver.exe要同版本号到小数后一位\n",
    "* 要确保可以 开启浏览器机器人\n",
    "* 要确保浏览器机器人 可以打开网页 driver.get(\"https://mp.weixin.qq.com\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### chromedriver的安装\n",
    "* 检查chrome位置和更新版本\n",
    "* 下载[chromedriver](https://chromedriver.storage.googleapis.com/index.html?path=81.0.4044.138/)，并放在chrome.exe同一个目录下\n",
    "* 配置chrome.exe路径的环境变量\n",
    "  * 此电脑-属性-高级系统设置-环境变量-Path-填入chrome.exe所在的路径，如：C:\\Users\\42436\\AppData\\Local\\Google\\Chrome\\Application"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:16: DeprecationWarning: use options instead of chrome_options\n",
      "  app.launch_new_instance()\n"
     ]
    }
   ],
   "source": [
    "from selenium import webdriver\n",
    "from selenium.webdriver.common.desired_capabilities import DesiredCapabilities\n",
    "\n",
    "#caps=dict()\n",
    "#caps[\"pageLoadStrategy\"] = \"none\"   # Do not wait for full page load\n",
    "\n",
    "opts = webdriver.ChromeOptions()\n",
    "opts.add_argument('--no-sandbox')#解决DevToolsActivePort文件不存在的报错\n",
    "opts.add_argument('window-size=1920x3000') #指定浏览器分辨率\n",
    "opts.add_argument('--disable-gpu') #谷歌文档提到需要加上一这个属性来规避bug\n",
    "opts.add_argument('--hide-scrollbars') #隐藏滚动条, 应对些特殊页面\n",
    "#opts.add_argument('blink-settings=imagesEnabled=false') #不加载图片, 提升速度\n",
    "#opts.add_argument('--headless') #浏览器不提供可视化页面. linux下如果系统不支持可视化不加这条会启动失败\n",
    "\n",
    "opts.binary_location = r\"C:\\Users\\42436\\AppData\\Local\\Google\\Chrome\\Application\\chrome.exe\"# \"H:\\_coding_\\Gitee\\InternetNewMedia\\CapstonePrj2016\\chromedriver.exe\"  \n",
    "driver = webdriver.Chrome( chrome_options = opts)\n",
    "#desired_capabilities=caps,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.get(\"https://mp.weixin.qq.com\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 填表登入"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "selenium 的定位方法\n",
    "* find_element_by_id &ensp;&ensp;&ensp;  根据标签id定位\n",
    "* find_element_by_name   &ensp;&ensp;&ensp; 根据标签的name定位\n",
    "* find_element_by_xpath  &ensp;&ensp;&ensp; 根据xpath定位\n",
    "* find_element_by_link_text  &ensp;&ensp;&ensp; 通过文字链接来定位元素\n",
    "* find_element_by_partial_link_text  &ensp;&ensp;&ensp;  通过文字链接来定位元素\n",
    "* find_element_by_tag_name  &ensp;&ensp;&ensp;  根据标签的名字定位\n",
    "* find_element_by_class_name  &ensp;&ensp;&ensp; 通过class name 定位\n",
    "* find_element_by_css_selector  &ensp;&ensp;&ensp;  根据元素属性来定位"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "payload =  {\"account\": \"424362655@qq.com\", \"password\": \"lbn.985064\"}\n",
    "# payload =  {\"account\": \"NFUHacks@163.com\", \"password\": \"NFU706947580\"}\n",
    "driver.find_element_by_xpath('//div[@class=\"login__type__container login__type__container__scan\"]/a').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WebDriver 常用方法：\n",
    "* clear()清楚文本\n",
    "* send_keys(values)模拟按键输入\n",
    "* click()模拟点击\n",
    "* submit模拟提交"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//form[@class=\"login_form\"]//input[@name=\"account\"]').clear()\n",
    "driver.find_element_by_xpath('//form[@class=\"login_form\"]//input[@name=\"account\"]').send_keys(payload['account'])\n",
    "driver.find_element_by_xpath('//form[@class=\"login_form\"]//input[@name=\"password\"]').clear()\n",
    "driver.find_element_by_xpath('//form[@class=\"login_form\"]//input[@name=\"password\"]').send_keys(payload['password'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//div[@class=\"login_btn_panel\"]/a').click()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 点选单"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其他常用方法\n",
    "* size：返回元素的尺寸\n",
    "* text：获取元素的文本\n",
    "* get_attribute：获取属性值  &ensp;&ensp;&ensp; get_attribute('innerHTML')获取元素内的全部HTML\n",
    "* is_displayed()：设置该元素用户是否可见"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'展开'"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "element = driver.find_element_by_xpath('//a[@id=\"m_open\"]')\n",
    "element.click()\n",
    "main_content = element.get_attribute('innerHTML')\n",
    "main_content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.execute_script(\"window.scrollTo(0,document.body.scrollHeight)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'https://mp.weixin.qq.com/cgi-bin/appmsg?begin=0&count=10&t=media/appmsg_list&type=10&action=list&token=1510336661&lang=zh_CN'"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "element = driver.find_element_by_xpath('//li[@title[contains(.,\"素材管理\")]]/a') \n",
    "# main_content = element.get_attribute('innerHTML')\n",
    "# main_content\n",
    "url2= element.get_attribute(\"href\")\n",
    "url2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.get(url2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 新建图文消息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "element = driver.find_element_by_xpath('//*[text()[contains(.,\"新建图文消息\")]]') \n",
    "main_content = element.get_attribute('innerHTML')\n",
    "main_content\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['CDwindow-B6B5775AF15663E118505FAD4D6109CE', 'CDwindow-2F44EC2D8120EDE891D9BE835A54FE28', 'CDwindow-A8F0B9BA16142C40558913A95DDF7F8C']\n"
     ]
    }
   ],
   "source": [
    "print (driver.window_handles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 新建图文消息开了另一分视窗，所以要切换 switch_to \n",
    "driver.switch_to.window(driver.window_handles[-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 超链接"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 当出现报错的话，可以尝试把页面回复到正常大小。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                超链接              \n"
     ]
    }
   ],
   "source": [
    "element = driver.find_element_by_xpath('//*[text()[contains(.,\"超链接\")]]') \n",
    "main_content = element.get_attribute('innerHTML')\n",
    "print(main_content)\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "选择其他公众号\n"
     ]
    }
   ],
   "source": [
    "# 点 选择其他公众号\n",
    "element = driver.find_element_by_xpath('//*[text()[contains(.,\"选择其他公众号\")]]') \n",
    "main_content = element.get_attribute('innerHTML')\n",
    "print(main_content)\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "driver.find_element_by_xpath('//form//div[@class=\"inner_link_account_area\"]//input[@class=\"weui-desktop-form__input\"]').clear()\n",
    "driver.find_element_by_xpath('//form//div[@class=\"inner_link_account_area\"]//input[@class=\"weui-desktop-form__input\"]').send_keys(公众号)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<div class=\"weui-desktop-icon weui-desktop-icon__inputSearch weui-desktop-icon__small\"><!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <!----> <svg width=\"16\" height=\"16\" viewBox=\"0 0 16 16\" xmlns=\"http://www.w3.org/2000/svg\"><path d=\"M11.33 10.007l4.273 4.273a.502.502 0 0 1 .005.709l-.585.584a.499.499 0 0 1-.709-.004L10.046 11.3a6.278 6.278 0 1 1 1.284-1.294zm.012-3.729a5.063 5.063 0 1 0-10.127 0 5.063 5.063 0 0 0 10.127 0z\"></path></svg> <!----> <!----> <!----> <!----></div>\n"
     ]
    }
   ],
   "source": [
    "# 点放大镜搜\n",
    "element = driver.find_element_by_xpath('//button[@class=\"weui-desktop-icon-btn weui-desktop-search__btn\"]')\n",
    "main_content = element.get_attribute('innerHTML')\n",
    "print(main_content)\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<li class=\"inner_link_account_item\"><div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/icHOSb47jqpW8eu1Xia8lDpvO4Y7YX8NZFIUkeqJicXHRXhFYibTC3BoUmHbufdr28w3euyDOuKoORSew5uESSGicibA/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人都是产品经理</strong> <i class=\"inner_link_account_wechat\">微信号：woshipm</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">订阅号</div></li><li class=\"inner_link_account_item\"><div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/j1k2QmgicFeD25XrJBGmtNWHvvhYcBz0oD3teygKa1ZeIEwficduqicjBk1DEBMPOP9PibWayhQibTPtxpd3EsicDf7Q/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人都是产品经理服务号</strong> <i class=\"inner_link_account_wechat\">微信号：woshipmfwh</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">服务号</div></li><li class=\"inner_link_account_item\"><div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/icQ2YibI9ribfMTeUicZxwYXWhTdNDY3MON1O0ibwBfQmlNOhuG5mGbzWM3b5yZPyNRx61ly26icwxtuD4U0R97CN2Zw/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人不都是产品经理</strong> <i class=\"inner_link_account_wechat\">微信号：DataProduct</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">订阅号</div></li><li class=\"inner_link_account_item\"><div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/P1hcMQ3PQVRseZGwfUEXXgaVWoqkJXSYRW6sRMHkQEjjRklGCLH5JPtLibmpSOib2n3K0A4Zf99rQk24qDsX8XCA/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人都是产品经理吗</strong> <i class=\"inner_link_account_wechat\">微信号：未设置</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">订阅号</div></li><li class=\"inner_link_account_item\"><div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/80kt4zPNYiaBSudUygykLCe2FOTG7Dibib7l5yNeMNqdgSm8cFa887lq2bNDyICo5sCxf2VCskW1tKhrG1S0gJ2kQ/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人都不是产品经理</strong> <i class=\"inner_link_account_wechat\">微信号：未设置</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">订阅号</div></li>\n"
     ]
    }
   ],
   "source": [
    "element = driver.find_element_by_xpath('//ul[@class=\"inner_link_account_list\"]')\n",
    "main_content = element.get_attribute('innerHTML')\n",
    "print(main_content)\n",
    "公众号SERP = main_content\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 解析\n",
    "root = fromstring(公众号SERP) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "主 = root.xpath('//li[@class=\"inner_link_account_item\"]')\n",
    "\n",
    "account_list = []\n",
    "for e in 主:\n",
    "    account_nickname = e.xpath('./div/strong[@class=\"inner_link_account_nickname\"]')[0].text\n",
    "    account_wechat = e.xpath('./div/i[@class=\"inner_link_account_wechat\"]')[0].text\n",
    "    account_img = e.xpath('./div/img/@src')[0]\n",
    "    account = {\"nickname\": account_nickname, \"wechat\": account_wechat, \"img\": account_img,}\n",
    "    account_list.append(account)\n",
    "\n",
    "df_account = pd.DataFrame(account_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nickname</th>\n",
       "      <th>wechat</th>\n",
       "      <th>img</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>人人都是产品经理</td>\n",
       "      <td>微信号：woshipm</td>\n",
       "      <td>http://mmbiz.qpic.cn/mmbiz_png/icHOSb47jqpW8eu...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>人人都是产品经理服务号</td>\n",
       "      <td>微信号：woshipmfwh</td>\n",
       "      <td>http://mmbiz.qpic.cn/mmbiz_png/j1k2QmgicFeD25X...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>人人不都是产品经理</td>\n",
       "      <td>微信号：DataProduct</td>\n",
       "      <td>http://mmbiz.qpic.cn/mmbiz_png/icQ2YibI9ribfMT...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>人人都是产品经理吗</td>\n",
       "      <td>微信号：未设置</td>\n",
       "      <td>http://mmbiz.qpic.cn/mmbiz_png/P1hcMQ3PQVRseZG...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>人人都不是产品经理</td>\n",
       "      <td>微信号：未设置</td>\n",
       "      <td>http://mmbiz.qpic.cn/mmbiz_png/80kt4zPNYiaBSud...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      nickname           wechat  \\\n",
       "0     人人都是产品经理      微信号：woshipm   \n",
       "1  人人都是产品经理服务号   微信号：woshipmfwh   \n",
       "2    人人不都是产品经理  微信号：DataProduct   \n",
       "3    人人都是产品经理吗          微信号：未设置   \n",
       "4    人人都不是产品经理          微信号：未设置   \n",
       "\n",
       "                                                 img  \n",
       "0  http://mmbiz.qpic.cn/mmbiz_png/icHOSb47jqpW8eu...  \n",
       "1  http://mmbiz.qpic.cn/mmbiz_png/j1k2QmgicFeD25X...  \n",
       "2  http://mmbiz.qpic.cn/mmbiz_png/icQ2YibI9ribfMT...  \n",
       "3  http://mmbiz.qpic.cn/mmbiz_png/P1hcMQ3PQVRseZG...  \n",
       "4  http://mmbiz.qpic.cn/mmbiz_png/80kt4zPNYiaBSud...  "
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_account"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<div class=\"weui-desktop-vm_primary\"><img src=\"http://mmbiz.qpic.cn/mmbiz_png/icHOSb47jqpW8eu1Xia8lDpvO4Y7YX8NZFIUkeqJicXHRXhFYibTC3BoUmHbufdr28w3euyDOuKoORSew5uESSGicibA/0?wx_fmt=png\" class=\"inner_link_account_avatar\"> <strong class=\"inner_link_account_nickname\">人人都是产品经理</strong> <i class=\"inner_link_account_wechat\">微信号：woshipm</i></div> <div class=\"weui-desktop-vm_default inner_link_account_type\">订阅号</div>\n"
     ]
    }
   ],
   "source": [
    "element = driver.find_element_by_xpath('//ul[@class=\"inner_link_account_list\"]/li')\n",
    "main_content = element.get_attribute('innerHTML')\n",
    "print(main_content)\n",
    "element.click()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\n跳转_input = driver.find_element_by_xpath(\\'//span[@class=\"weui-desktop-pagination__form\"]/input\\')\\n跳转_a = driver.find_element_by_xpath(\\'//span[@class=\"weui-desktop-pagination__form\"]/a\\')\\n跳转_input.clear()\\n跳转_input.send_keys(2)\\n跳转_a.click()\\n'"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 跳转testing\n",
    "'''\n",
    "跳转_input = driver.find_element_by_xpath('//span[@class=\"weui-desktop-pagination__form\"]/input')\n",
    "跳转_a = driver.find_element_by_xpath('//span[@class=\"weui-desktop-pagination__form\"]/a')\n",
    "跳转_input.clear()\n",
    "跳转_input.send_keys(2)\n",
    "跳转_a.click()\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 472]\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "# 跳转上限\n",
    "l_e = driver.find_elements_by_xpath('//label[@class=\"weui-desktop-pagination__num\"]')\n",
    "l_e_int  = [int(x.text) for x in l_e] \n",
    "print (l_e_int)\n",
    "print (l_e_int[0]==l_e_int[-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472]\n"
     ]
    }
   ],
   "source": [
    "pages = list(range(l_e_int[0],l_e_int[-1]+1 ))\n",
    "#print(pages[0:2])\n",
    "pages = list(range(1,l_e_int[-1]+1 ))\n",
    "print(pages)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 循环/遍历"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "# global varialbes \n",
    "html_raw = dict()\n",
    "main_content =\"\"\n",
    "element = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "def process_pages (pages):\n",
    "    for p in pages:\n",
    "        print (p,end='\\t')\n",
    "\n",
    "        跳转_input = driver.find_element_by_xpath('//span[@class=\"weui-desktop-pagination__form\"]/input')\n",
    "        跳转_a = driver.find_element_by_xpath('//span[@class=\"weui-desktop-pagination__form\"]/a')\n",
    "        跳转_input.clear()\n",
    "        跳转_input.send_keys(p)\n",
    "        跳转_a.click()\n",
    "\n",
    "        time.sleep(5+5*random())\n",
    "\n",
    "        element = driver.find_element_by_xpath('//div[@class=\"inner_link_article_list\"]')\n",
    "        main_content = element.get_attribute('innerHTML')\n",
    "        #print(main_content)\n",
    "        html_raw[p] = main_content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17\t18\t19\t20\t21\t22\t23\t24\t25\t26\t27\t28\t29\t30\t31\t32\t33\t34\t35\t36\t37\t38\t39\t40\t41\t42\t43\t44\t45\t46\t47\t48\t49\t50\t51\t52\t53\t54\t55\t56\t57\t58\t59\t60\t61\t62\t63\t64\t65\t66\t67\t68\t69\t70\t71\t72\t73\t74\t75\t76\t77\t78\t79\t80\t81\t82\t83\t84\t85\t86\t87\t88\t89\t90\t91\t92\t93\t94\t95\t96\t97\t98\t99\t100\t101\t102\t103\t104\t105\t106\t107\t108\t109\t110\t111\t112\t113\t114\t115\t116\t117\t118\t119\t120\t121\t122\t123\t124\t125\t126\t127\t128\t129\t130\t131\t132\t133\t134\t135\t136\t137\t138\t139\t140\t141\t142\t143\t144\t145\t146\t147\t148\t149\t150\t151\t152\t153\t154\t155\t156\t157\t158\t159\t160\t161\t162\t163\t164\t165\t166\t167\t168\t169\t170\t171\t172\t173\t174\t175\t176\t177\t178\t179\t180\t181\t182\t183\t184\t185\t186\t187\t188\t189\t190\t191\t192\t193\t194\t195\t196\t197\t198\t199\t200\t201\t202\t203\t204\t205\t206\t207\t208\t209\t210\t211\t212\t213\t214\t215\t216\t217\t218\t219\t220\t221\t222\t223\t224\t225\t226\t227\t228\t229\t230\t231\t232\t233\t234\t235\t236\t237\t238\t239\t240\t241\t242\t243\t244\t245\t246\t247\t248\t249\t250\t251\t252\t253\t254\t255\t256\t257\t258\t259\t260\t261\t262\t263\t264\t265\t266\t267\t268\t269\t270\t271\t272\t273\t274\t275\t276\t277\t278\t279\t280\t281\t282\t283\t284\t285\t286\t287\t288\t289\t290\t291\t292\t293\t294\t295\t296\t297\t298\t299\t300\t301\t302\t303\t304\t305\t306\t307\t308\t309\t310\t311\t312\t313\t314\t315\t316\t317\t318\t319\t320\t321\t322\t323\t324\t325\t326\t327\t328\t329\t330\t331\t332\t333\t334\t335\t336\t337\t338\t339\t340\t341\t342\t343\t344\t345\t346\t347\t348\t349\t350\t351\t352\t353\t354\t355\t356\t357\t358\t359\t360\t361\t362\t363\t364\t365\t366\t367\t368\t369\t370\t371\t372\t373\t374\t375\t376\t377\t378\t379\t380\t381\t382\t383\t384\t385\t386\t387\t388\t389\t390\t391\t392\t393\t394\t395\t396\t397\t398\t399\t400\t401\t402\t403\t404\t405\t406\t407\t408\t409\t410\t411\t412\t413\t414\t415\t416\t417\t418\t419\t420\t421\t422\t423\t424\t425\t426\t427\t428\t429\t430\t431\t432\t433\t434\t435\t436\t437\t438\t439\t440\t441\t442\t443\t444\t445\t446\t447\t448\t449\t450\t451\t452\t453\t454\t455\t456\t457\t458\t459\t460\t461\t462\t463\t464\t465\t466\t467\t468\t469\t470\t471\t472\t"
     ]
    }
   ],
   "source": [
    "process_pages(pages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>html_snippets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>468</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>469</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>471</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>472</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>472 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         html_snippets\n",
       "1    <div><label class=\"inner_link_article_item\"><s...\n",
       "2    <div><label class=\"inner_link_article_item\"><s...\n",
       "3    <div><label class=\"inner_link_article_item\"><s...\n",
       "4    <div><label class=\"inner_link_article_item\"><s...\n",
       "5    <div><label class=\"inner_link_article_item\"><s...\n",
       "..                                                 ...\n",
       "468  <div><label class=\"inner_link_article_item\"><s...\n",
       "469  <div><label class=\"inner_link_article_item\"><s...\n",
       "470  <div><label class=\"inner_link_article_item\"><s...\n",
       "471  <div><label class=\"inner_link_article_item\"><s...\n",
       "472  <div><label class=\"inner_link_article_item\"><s...\n",
       "\n",
       "[472 rows x 1 columns]"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([html_raw]).T\n",
    "df.columns = [\"html_snippets\"]\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored 'html_raw' (dict)\n"
     ]
    }
   ],
   "source": [
    "%store html_raw\n",
    "import pickle \n",
    "filehandler = open(\"html_raw\", 'wb') \n",
    "pickle.dump(html_raw, filehandler)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "60\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>html_snippets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>64</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>468</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>469</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>470</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>471</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>472</th>\n",
       "      <td>&lt;div&gt;&lt;label class=\"inner_link_article_item\"&gt;&lt;s...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>412 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         html_snippets\n",
       "12   <div><label class=\"inner_link_article_item\"><s...\n",
       "62   <div><label class=\"inner_link_article_item\"><s...\n",
       "63   <div><label class=\"inner_link_article_item\"><s...\n",
       "64   <div><label class=\"inner_link_article_item\"><s...\n",
       "65   <div><label class=\"inner_link_article_item\"><s...\n",
       "..                                                 ...\n",
       "468  <div><label class=\"inner_link_article_item\"><s...\n",
       "469  <div><label class=\"inner_link_article_item\"><s...\n",
       "470  <div><label class=\"inner_link_article_item\"><s...\n",
       "471  <div><label class=\"inner_link_article_item\"><s...\n",
       "472  <div><label class=\"inner_link_article_item\"><s...\n",
       "\n",
       "[412 rows x 1 columns]"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_out = df[~df.duplicated()]\n",
    "print (len(df_out))\n",
    "df[df.duplicated()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[12, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[12,\n",
       " 62,\n",
       " 63,\n",
       " 64,\n",
       " 65,\n",
       " 66,\n",
       " 67,\n",
       " 68,\n",
       " 69,\n",
       " 70,\n",
       " 71,\n",
       " 72,\n",
       " 73,\n",
       " 74,\n",
       " 75,\n",
       " 76,\n",
       " 77,\n",
       " 78,\n",
       " 79,\n",
       " 80,\n",
       " 81,\n",
       " 82,\n",
       " 83,\n",
       " 84,\n",
       " 85,\n",
       " 86,\n",
       " 87,\n",
       " 88,\n",
       " 89,\n",
       " 90,\n",
       " 91,\n",
       " 92,\n",
       " 93,\n",
       " 94,\n",
       " 95,\n",
       " 96,\n",
       " 97,\n",
       " 98,\n",
       " 99,\n",
       " 100,\n",
       " 101,\n",
       " 102,\n",
       " 103,\n",
       " 104,\n",
       " 105,\n",
       " 106,\n",
       " 107,\n",
       " 108,\n",
       " 109,\n",
       " 110,\n",
       " 111,\n",
       " 112,\n",
       " 113,\n",
       " 114,\n",
       " 115,\n",
       " 116,\n",
       " 117,\n",
       " 118,\n",
       " 119,\n",
       " 120,\n",
       " 121,\n",
       " 122,\n",
       " 123,\n",
       " 124,\n",
       " 125,\n",
       " 126,\n",
       " 127,\n",
       " 128,\n",
       " 129,\n",
       " 130,\n",
       " 131,\n",
       " 132,\n",
       " 133,\n",
       " 134,\n",
       " 135,\n",
       " 136,\n",
       " 137,\n",
       " 138,\n",
       " 139,\n",
       " 140,\n",
       " 141,\n",
       " 142,\n",
       " 143,\n",
       " 144,\n",
       " 145,\n",
       " 146,\n",
       " 147,\n",
       " 148,\n",
       " 149,\n",
       " 150,\n",
       " 151,\n",
       " 152,\n",
       " 153,\n",
       " 154,\n",
       " 155,\n",
       " 156,\n",
       " 157,\n",
       " 158,\n",
       " 159,\n",
       " 160,\n",
       " 161,\n",
       " 162,\n",
       " 163,\n",
       " 164,\n",
       " 165,\n",
       " 166,\n",
       " 167,\n",
       " 168,\n",
       " 169,\n",
       " 170,\n",
       " 171,\n",
       " 172,\n",
       " 173,\n",
       " 174,\n",
       " 175,\n",
       " 176,\n",
       " 177,\n",
       " 178,\n",
       " 179,\n",
       " 180,\n",
       " 181,\n",
       " 182,\n",
       " 183,\n",
       " 184,\n",
       " 185,\n",
       " 186,\n",
       " 187,\n",
       " 188,\n",
       " 189,\n",
       " 190,\n",
       " 191,\n",
       " 192,\n",
       " 193,\n",
       " 194,\n",
       " 195,\n",
       " 196,\n",
       " 197,\n",
       " 198,\n",
       " 199,\n",
       " 200,\n",
       " 201,\n",
       " 202,\n",
       " 203,\n",
       " 204,\n",
       " 205,\n",
       " 206,\n",
       " 207,\n",
       " 208,\n",
       " 209,\n",
       " 210,\n",
       " 211,\n",
       " 212,\n",
       " 213,\n",
       " 214,\n",
       " 215,\n",
       " 216,\n",
       " 217,\n",
       " 218,\n",
       " 219,\n",
       " 220,\n",
       " 221,\n",
       " 222,\n",
       " 223,\n",
       " 224,\n",
       " 225,\n",
       " 226,\n",
       " 227,\n",
       " 228,\n",
       " 229,\n",
       " 230,\n",
       " 231,\n",
       " 232,\n",
       " 233,\n",
       " 234,\n",
       " 235,\n",
       " 236,\n",
       " 237,\n",
       " 238,\n",
       " 239,\n",
       " 240,\n",
       " 241,\n",
       " 242,\n",
       " 243,\n",
       " 244,\n",
       " 245,\n",
       " 246,\n",
       " 247,\n",
       " 248,\n",
       " 249,\n",
       " 250,\n",
       " 251,\n",
       " 252,\n",
       " 253,\n",
       " 254,\n",
       " 255,\n",
       " 256,\n",
       " 257,\n",
       " 258,\n",
       " 259,\n",
       " 260,\n",
       " 261,\n",
       " 262,\n",
       " 263,\n",
       " 264,\n",
       " 265,\n",
       " 266,\n",
       " 267,\n",
       " 268,\n",
       " 269,\n",
       " 270,\n",
       " 271,\n",
       " 272,\n",
       " 273,\n",
       " 274,\n",
       " 275,\n",
       " 276,\n",
       " 277,\n",
       " 278,\n",
       " 279,\n",
       " 280,\n",
       " 281,\n",
       " 282,\n",
       " 283,\n",
       " 284,\n",
       " 285,\n",
       " 286,\n",
       " 287,\n",
       " 288,\n",
       " 289,\n",
       " 290,\n",
       " 291,\n",
       " 292,\n",
       " 293,\n",
       " 294,\n",
       " 295,\n",
       " 296,\n",
       " 297,\n",
       " 298,\n",
       " 299,\n",
       " 300,\n",
       " 301,\n",
       " 302,\n",
       " 303,\n",
       " 304,\n",
       " 305,\n",
       " 306,\n",
       " 307,\n",
       " 308,\n",
       " 309,\n",
       " 310,\n",
       " 311,\n",
       " 312,\n",
       " 313,\n",
       " 314,\n",
       " 315,\n",
       " 316,\n",
       " 317,\n",
       " 318,\n",
       " 319,\n",
       " 320,\n",
       " 321,\n",
       " 322,\n",
       " 323,\n",
       " 324,\n",
       " 325,\n",
       " 326,\n",
       " 327,\n",
       " 328,\n",
       " 329,\n",
       " 330,\n",
       " 331,\n",
       " 332,\n",
       " 333,\n",
       " 334,\n",
       " 335,\n",
       " 336,\n",
       " 337,\n",
       " 338,\n",
       " 339,\n",
       " 340,\n",
       " 341,\n",
       " 342,\n",
       " 343,\n",
       " 344,\n",
       " 345,\n",
       " 346,\n",
       " 347,\n",
       " 348,\n",
       " 349,\n",
       " 350,\n",
       " 351,\n",
       " 352,\n",
       " 353,\n",
       " 354,\n",
       " 355,\n",
       " 356,\n",
       " 357,\n",
       " 358,\n",
       " 359,\n",
       " 360,\n",
       " 361,\n",
       " 362,\n",
       " 363,\n",
       " 364,\n",
       " 365,\n",
       " 366,\n",
       " 367,\n",
       " 368,\n",
       " 369,\n",
       " 370,\n",
       " 371,\n",
       " 372,\n",
       " 373,\n",
       " 374,\n",
       " 375,\n",
       " 376,\n",
       " 377,\n",
       " 378,\n",
       " 379,\n",
       " 380,\n",
       " 381,\n",
       " 382,\n",
       " 383,\n",
       " 384,\n",
       " 385,\n",
       " 386,\n",
       " 387,\n",
       " 388,\n",
       " 389,\n",
       " 390,\n",
       " 391,\n",
       " 392,\n",
       " 393,\n",
       " 394,\n",
       " 395,\n",
       " 396,\n",
       " 397,\n",
       " 398,\n",
       " 399,\n",
       " 400,\n",
       " 401,\n",
       " 402,\n",
       " 403,\n",
       " 404,\n",
       " 405,\n",
       " 406,\n",
       " 407,\n",
       " 408,\n",
       " 409,\n",
       " 410,\n",
       " 411,\n",
       " 412,\n",
       " 413,\n",
       " 414,\n",
       " 415,\n",
       " 416,\n",
       " 417,\n",
       " 418,\n",
       " 419,\n",
       " 420,\n",
       " 421,\n",
       " 422,\n",
       " 423,\n",
       " 424,\n",
       " 425,\n",
       " 426,\n",
       " 427,\n",
       " 428,\n",
       " 429,\n",
       " 430,\n",
       " 431,\n",
       " 432,\n",
       " 433,\n",
       " 434,\n",
       " 435,\n",
       " 436,\n",
       " 437,\n",
       " 438,\n",
       " 439,\n",
       " 440,\n",
       " 441,\n",
       " 442,\n",
       " 443,\n",
       " 444,\n",
       " 445,\n",
       " 446,\n",
       " 447,\n",
       " 448,\n",
       " 449,\n",
       " 450,\n",
       " 451,\n",
       " 452,\n",
       " 453,\n",
       " 454,\n",
       " 455,\n",
       " 456,\n",
       " 457,\n",
       " 458,\n",
       " 459,\n",
       " 460,\n",
       " 461,\n",
       " 462,\n",
       " 463,\n",
       " 464,\n",
       " 465,\n",
       " 466,\n",
       " 467,\n",
       " 468,\n",
       " 469,\n",
       " 470,\n",
       " 471,\n",
       " 472]"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "try_again = list(df[df.duplicated()].index)\n",
    "print(try_again)\n",
    "try_again = try_again + list (set(pages).difference(set(df.index.values)))\n",
    "try_again"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 暂存档"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = fn [\"output\"] [\"公众号_htm_snippets\"] \n",
    "df_out.to_excel(filename.format(公众号=公众号),encoding=\"utf8\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "25,26,24,25,25,25,25,25,20,24,25,26,24,25,25,24,25,25,25,24,21,5,5,21,25,25,20,22,27,21,23,26,24,24,25,22,24,22,25,26,21,18,28,23,14,16,23,19,24,28,22,23,24,25,21,21,24,21,17,24,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,23,"
     ]
    }
   ],
   "source": [
    "def parse_html_snippets(_snippet_):\n",
    "    root = fromstring(_snippet_) \n",
    "    title = [x.text for x in root.xpath('//div[@class=\"inner_link_article_title\"]')]\n",
    "    create_time = [x.text for x in root.xpath('//div[@class=\"inner_link_article_date\"]')]\n",
    "    link = [x for x in root.xpath('//a/@href')]\n",
    "    _df_ = pd.DataFrame({\"title\":title, \"create_time\": create_time, \"link\":link})\n",
    "    return(_df_)\n",
    "    \n",
    "l_df = []\n",
    "for p in pages:\n",
    "    _df_ = parse_html_snippets(df.loc[p,\"html_snippets\"])\n",
    "    print (len(_df_), end=\",\")\n",
    "    l_df.append(_df_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>这是产品新人真正需要的一次项目评审会，全网直播</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>一个有趣的404页面，是如何体现的？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6000字，告诉你社群运营怎么做</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>公开课 | 我0基础，距离大厂还有多远？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B端产品，四招搞定老板不靠谱需求</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10832</th>\n",
       "      <td>我在 P2P 公司，做了三年活动运营</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10833</th>\n",
       "      <td>同一个岗位，他的薪资凭什么是别人的2倍？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10834</th>\n",
       "      <td>身为产品，如何做向上管理？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10835</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10836</th>\n",
       "      <td>一份有关如何提高创意思维的清单！</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10837 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                               title create_time  \\\n",
       "0            这是产品新人真正需要的一次项目评审会，全网直播  2020-05-16   \n",
       "1                 一个有趣的404页面，是如何体现的？  2020-05-16   \n",
       "2                   6000字，告诉你社群运营怎么做  2020-05-16   \n",
       "3               公开课 | 我0基础，距离大厂还有多远？  2020-05-16   \n",
       "4                   B端产品，四招搞定老板不靠谱需求  2020-05-16   \n",
       "...                              ...         ...   \n",
       "10832             我在 P2P 公司，做了三年活动运营  2019-07-16   \n",
       "10833           同一个岗位，他的薪资凭什么是别人的2倍？  2019-07-16   \n",
       "10834                  身为产品，如何做向上管理？  2019-07-16   \n",
       "10835  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10836               一份有关如何提高创意思维的清单！  2019-07-16   \n",
       "\n",
       "                                                    link  \n",
       "0      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "2      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "3      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "4      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                  ...  \n",
       "10832  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10833  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10834  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10835  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10836  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[10837 rows x 3 columns]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_url_out = pd.concat(l_df).reset_index(drop=True)\n",
    "# df_url_out.loc[0:10]\n",
    "df_url_out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10832</th>\n",
       "      <td>我在 P2P 公司，做了三年活动运营</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10833</th>\n",
       "      <td>同一个岗位，他的薪资凭什么是别人的2倍？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10834</th>\n",
       "      <td>身为产品，如何做向上管理？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10835</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10836</th>\n",
       "      <td>一份有关如何提高创意思维的清单！</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               title create_time  \\\n",
       "10832             我在 P2P 公司，做了三年活动运营  2019-07-16   \n",
       "10833           同一个岗位，他的薪资凭什么是别人的2倍？  2019-07-16   \n",
       "10834                  身为产品，如何做向上管理？  2019-07-16   \n",
       "10835  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10836               一份有关如何提高创意思维的清单！  2019-07-16   \n",
       "\n",
       "                                                    link  \n",
       "10832  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10833  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10834  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10835  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10836  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_url_out.tail(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {},
   "outputs": [],
   "source": [
    "# tagging 标记\n",
    "tagging_list = [\"产品运营\",\"产品经理\",\"产品设计\",\\\n",
    "                \"新媒体运营\",\"用户\",\"用户体验\",\\\n",
    "                \"文案\",\"数据分析\"] #overwritable\n",
    "\n",
    "v_v_list = []\n",
    "\n",
    "for tag in tagging_list:\n",
    "    index_list = df_url_out [ df_url_out.title.str.contains(tag) ].index.tolist()\n",
    "    v_v_pairs = pd.DataFrame({tag:index_list}).melt().set_index(\"value\")\n",
    "    v_v_list.append(v_v_pairs)\n",
    "\n",
    "df_cat = v_v_list[0]\n",
    "for d in v_v_list:\n",
    "    df_cat.update(d)\n",
    "    \n",
    "# # 尚未标记内容\n",
    "# df_url_out.loc [ df_cat.query('variable==\"\"').index ]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>269</th>\n",
       "      <td>如何向女友解释 Push 原理</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>270</th>\n",
       "      <td>产品经理不懂技术容易踩的坑！看了我都心疼……</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>271</th>\n",
       "      <td>小众产品不存在竞争</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>272</th>\n",
       "      <td>项目复盘，给我上了一场「产品规划」课</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>273</th>\n",
       "      <td>请警惕：这些营销定律逐渐在失效</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10832</th>\n",
       "      <td>我在 P2P 公司，做了三年活动运营</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10833</th>\n",
       "      <td>同一个岗位，他的薪资凭什么是别人的2倍？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10834</th>\n",
       "      <td>身为产品，如何做向上管理？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10835</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10836</th>\n",
       "      <td>一份有关如何提高创意思维的清单！</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>9479 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                               title create_time  \\\n",
       "269                  如何向女友解释 Push 原理  2020-03-22   \n",
       "270           产品经理不懂技术容易踩的坑！看了我都心疼……  2020-03-22   \n",
       "271                        小众产品不存在竞争  2020-03-22   \n",
       "272               项目复盘，给我上了一场「产品规划」课  2020-03-22   \n",
       "273                  请警惕：这些营销定律逐渐在失效  2020-03-22   \n",
       "...                              ...         ...   \n",
       "10832             我在 P2P 公司，做了三年活动运营  2019-07-16   \n",
       "10833           同一个岗位，他的薪资凭什么是别人的2倍？  2019-07-16   \n",
       "10834                  身为产品，如何做向上管理？  2019-07-16   \n",
       "10835  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10836               一份有关如何提高创意思维的清单！  2019-07-16   \n",
       "\n",
       "                                                    link  \n",
       "269    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "270    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "271    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "272    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "273    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                  ...  \n",
       "10832  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10833  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10834  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10835  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10836  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[9479 rows x 3 columns]"
      ]
     },
     "execution_count": 182,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_url_out[df_url_out.duplicated()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>这是产品新人真正需要的一次项目评审会，全网直播</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>一个有趣的404页面，是如何体现的？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6000字，告诉你社群运营怎么做</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>公开课 | 我0基础，距离大厂还有多远？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B端产品，四招搞定老板不靠谱需求</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1379</th>\n",
       "      <td>我在 P2P 公司，做了三年活动运营</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1380</th>\n",
       "      <td>同一个岗位，他的薪资凭什么是别人的2倍？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1381</th>\n",
       "      <td>身为产品，如何做向上管理？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1382</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1383</th>\n",
       "      <td>一份有关如何提高创意思维的清单！</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1358 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                              title create_time  \\\n",
       "0           这是产品新人真正需要的一次项目评审会，全网直播  2020-05-16   \n",
       "1                一个有趣的404页面，是如何体现的？  2020-05-16   \n",
       "2                  6000字，告诉你社群运营怎么做  2020-05-16   \n",
       "3              公开课 | 我0基础，距离大厂还有多远？  2020-05-16   \n",
       "4                  B端产品，四招搞定老板不靠谱需求  2020-05-16   \n",
       "...                             ...         ...   \n",
       "1379             我在 P2P 公司，做了三年活动运营  2019-07-16   \n",
       "1380           同一个岗位，他的薪资凭什么是别人的2倍？  2019-07-16   \n",
       "1381                  身为产品，如何做向上管理？  2019-07-16   \n",
       "1382  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "1383               一份有关如何提高创意思维的清单！  2019-07-16   \n",
       "\n",
       "                                                   link  \n",
       "0     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "2     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "3     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "4     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                 ...  \n",
       "1379  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1380  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1381  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1382  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1383  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[1358 rows x 3 columns]"
      ]
     },
     "execution_count": 183,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_url_out[~df_url_out.duplicated()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "      <th>variable</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>这是产品新人真正需要的一次项目评审会，全网直播</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>一个有趣的404页面，是如何体现的？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6000字，告诉你社群运营怎么做</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>公开课 | 我0基础，距离大厂还有多远？</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>B端产品，四招搞定老板不靠谱需求</td>\n",
       "      <td>2020-05-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10832</th>\n",
       "      <td>我在 P2P 公司，做了三年活动运营</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10833</th>\n",
       "      <td>同一个岗位，他的薪资凭什么是别人的2倍？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10834</th>\n",
       "      <td>身为产品，如何做向上管理？</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10835</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10836</th>\n",
       "      <td>一份有关如何提高创意思维的清单！</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "      <td>无法分类</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10837 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                               title create_time  \\\n",
       "0            这是产品新人真正需要的一次项目评审会，全网直播  2020-05-16   \n",
       "1                 一个有趣的404页面，是如何体现的？  2020-05-16   \n",
       "2                   6000字，告诉你社群运营怎么做  2020-05-16   \n",
       "3               公开课 | 我0基础，距离大厂还有多远？  2020-05-16   \n",
       "4                   B端产品，四招搞定老板不靠谱需求  2020-05-16   \n",
       "...                              ...         ...   \n",
       "10832             我在 P2P 公司，做了三年活动运营  2019-07-16   \n",
       "10833           同一个岗位，他的薪资凭什么是别人的2倍？  2019-07-16   \n",
       "10834                  身为产品，如何做向上管理？  2019-07-16   \n",
       "10835  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10836               一份有关如何提高创意思维的清单！  2019-07-16   \n",
       "\n",
       "                                                    link variable  \n",
       "0      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "1      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "2      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "3      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "4      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "...                                                  ...      ...  \n",
       "10832  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "10833  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "10834  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "10835  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "10836  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...     无法分类  \n",
       "\n",
       "[10837 rows x 4 columns]"
      ]
     },
     "execution_count": 184,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_o = df_url_out.join(df_cat).replace(\"\", np.nan).fillna(\"无法分类\")\n",
    "df_o"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>219</th>\n",
       "      <td>运营进阶须知：如何掌握产品运营画布九要素？</td>\n",
       "      <td>2020-04-02</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>553</th>\n",
       "      <td>这些产品运营的坑，摔了也不稀奇</td>\n",
       "      <td>2020-01-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>698</th>\n",
       "      <td>产品运营系统模型，提高你的产品思维</td>\n",
       "      <td>2019-12-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>901</th>\n",
       "      <td>7个基础心理学原理，玩转产品运营</td>\n",
       "      <td>2019-11-02</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1075</th>\n",
       "      <td>新消费浪潮，流量吃紧时代下，产品运营怎么做？</td>\n",
       "      <td>2019-09-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1146</th>\n",
       "      <td>9月广州产品运营大会报名：16小时听懂能力变革的新趋势</td>\n",
       "      <td>2019-09-06</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1152</th>\n",
       "      <td>5大趋势，说清楚产品运营人的能力变革</td>\n",
       "      <td>2019-09-05</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1250</th>\n",
       "      <td>9月广州产品运营大会，一起来聊聊产品新思维，运营新机会！</td>\n",
       "      <td>2019-08-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                             title create_time  \\\n",
       "219          运营进阶须知：如何掌握产品运营画布九要素？  2020-04-02   \n",
       "553                这些产品运营的坑，摔了也不稀奇  2020-01-16   \n",
       "698              产品运营系统模型，提高你的产品思维  2019-12-15   \n",
       "901               7个基础心理学原理，玩转产品运营  2019-11-02   \n",
       "1075        新消费浪潮，流量吃紧时代下，产品运营怎么做？  2019-09-21   \n",
       "1146   9月广州产品运营大会报名：16小时听懂能力变革的新趋势  2019-09-06   \n",
       "1152            5大趋势，说清楚产品运营人的能力变革  2019-09-05   \n",
       "1250  9月广州产品运营大会，一起来聊聊产品新思维，运营新机会！  2019-08-15   \n",
       "\n",
       "                                                   link  \n",
       "219   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "553   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "698   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "901   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1075  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1146  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1152  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1250  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cpyy = df_url_out[df_url_out.title.str.contains(\"产品运营\")]\n",
    "cpyy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>10 年产品人：互联网产品经理的本质是什么？</td>\n",
       "      <td>2020-05-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0基础转岗数据产品经理，这3个问题必须搞明白！</td>\n",
       "      <td>2020-05-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>产品经理不懂技术容易踩的坑！看了我都心疼……</td>\n",
       "      <td>2020-03-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>限时免费 | 如何入门数据产品经理？这里有个好方法！</td>\n",
       "      <td>2020-03-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>阿里 / 腾讯等公司招聘 AI 产品经理，这些能力是面试重点</td>\n",
       "      <td>2020-05-01</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10792</th>\n",
       "      <td>我在阿里这 10 年：从 BI 到产品经理，曾被程序员踢翻桌子骂</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10797</th>\n",
       "      <td>直播预约丨7月杭州产品经理大会，一场不可错过的知识盛宴！</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10814</th>\n",
       "      <td>电商产品经理的三级火箭模型，每一级都是蜕变</td>\n",
       "      <td>2019-07-20</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10815</th>\n",
       "      <td>我在阿里这 10 年：从 BI 到产品经理，曾被程序员踢翻桌子骂</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10820</th>\n",
       "      <td>直播预约丨7月杭州产品经理大会，一场不可错过的知识盛宴！</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1400 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                  title create_time  \\\n",
       "5                10 年产品人：互联网产品经理的本质是什么？  2020-05-15   \n",
       "6               0基础转岗数据产品经理，这3个问题必须搞明白！  2020-05-15   \n",
       "26               产品经理不懂技术容易踩的坑！看了我都心疼……  2020-03-22   \n",
       "33           限时免费 | 如何入门数据产品经理？这里有个好方法！  2020-03-21   \n",
       "76       阿里 / 腾讯等公司招聘 AI 产品经理，这些能力是面试重点  2020-05-01   \n",
       "...                                 ...         ...   \n",
       "10792  我在阿里这 10 年：从 BI 到产品经理，曾被程序员踢翻桌子骂  2019-07-19   \n",
       "10797      直播预约丨7月杭州产品经理大会，一场不可错过的知识盛宴！  2019-07-19   \n",
       "10814             电商产品经理的三级火箭模型，每一级都是蜕变  2019-07-20   \n",
       "10815  我在阿里这 10 年：从 BI 到产品经理，曾被程序员踢翻桌子骂  2019-07-19   \n",
       "10820      直播预约丨7月杭州产品经理大会，一场不可错过的知识盛宴！  2019-07-19   \n",
       "\n",
       "                                                    link  \n",
       "5      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "6      http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "26     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "33     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "76     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                  ...  \n",
       "10792  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10797  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10814  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10815  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10820  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[1400 rows x 3 columns]"
      ]
     },
     "execution_count": 180,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cpjl = df_url_out[df_url_out.title.str.contains(\"产品经理\")]\n",
    "cpjl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>这套微信小程序产品设计法则，助你抢占2020年小程序红利！</td>\n",
       "      <td>2020-03-18</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>为什么后台产品设计能力，对于电商产品人这么重要？</td>\n",
       "      <td>2020-05-04</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>从产品设计角度，看蚂蚁森林的增长飞轮</td>\n",
       "      <td>2020-05-04</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>你有一份小程序产品设计指南，请查收</td>\n",
       "      <td>2020-04-29</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>叫了滴滴代驾后，我思考了下背后的产品设计</td>\n",
       "      <td>2020-04-26</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>113</th>\n",
       "      <td>产品设计被列入产品晋升考察范围，新人不会Axure原型有多亏？</td>\n",
       "      <td>2020-04-24</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>150</th>\n",
       "      <td>来，跟火爆全网的「动物之森」学游戏化产品设计！</td>\n",
       "      <td>2020-04-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158</th>\n",
       "      <td>微信小程序产品设计要点拆解，助你抢占2020小程序红利！</td>\n",
       "      <td>2020-04-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>290</th>\n",
       "      <td>这套微信小程序产品设计法则，助你抢占2020年小程序红利！</td>\n",
       "      <td>2020-03-18</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>355</th>\n",
       "      <td>掌握这套后台产品设计法则，晋升路上少走1-3年弯路</td>\n",
       "      <td>2020-03-05</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>444</th>\n",
       "      <td>这里有一套微信小程序产品设计法则，2020年产品人必备！</td>\n",
       "      <td>2020-02-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1157</th>\n",
       "      <td>从人性角度，看抖音的产品设计</td>\n",
       "      <td>2019-09-04</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                title create_time  \\\n",
       "46      这套微信小程序产品设计法则，助你抢占2020年小程序红利！  2020-03-18   \n",
       "62           为什么后台产品设计能力，对于电商产品人这么重要？  2020-05-04   \n",
       "63                 从产品设计角度，看蚂蚁森林的增长飞轮  2020-05-04   \n",
       "86                  你有一份小程序产品设计指南，请查收  2020-04-29   \n",
       "100              叫了滴滴代驾后，我思考了下背后的产品设计  2020-04-26   \n",
       "113   产品设计被列入产品晋升考察范围，新人不会Axure原型有多亏？  2020-04-24   \n",
       "150           来，跟火爆全网的「动物之森」学游戏化产品设计！  2020-04-16   \n",
       "158      微信小程序产品设计要点拆解，助你抢占2020小程序红利！  2020-04-15   \n",
       "290     这套微信小程序产品设计法则，助你抢占2020年小程序红利！  2020-03-18   \n",
       "355         掌握这套后台产品设计法则，晋升路上少走1-3年弯路  2020-03-05   \n",
       "444      这里有一套微信小程序产品设计法则，2020年产品人必备！  2020-02-16   \n",
       "1157                   从人性角度，看抖音的产品设计  2019-09-04   \n",
       "\n",
       "                                                   link  \n",
       "46    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "62    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "63    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "86    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "100   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "113   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "150   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "158   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "290   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "355   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "444   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1157  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 162,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cpsj = df_url_out[df_url_out.title.str.contains(\"产品设计\")]\n",
    "cpsj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2020年，为什么建议你尝试做B端新媒体运营？</td>\n",
       "      <td>2020-05-13</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>线上课程 | 想做好新媒体运营，你需要先了解行业规则！</td>\n",
       "      <td>2020-04-27</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>265</th>\n",
       "      <td>别再到处收藏资料包啦！3步提升你的新媒体运营能力！</td>\n",
       "      <td>2020-03-23</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347</th>\n",
       "      <td>新媒体运营新人，如何体系化入门快速完善初阶必备技能？</td>\n",
       "      <td>2020-03-07</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>376</th>\n",
       "      <td>B端新媒体运营思考：聊聊个人价值、标杆与内容输出</td>\n",
       "      <td>2020-03-01</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>605</th>\n",
       "      <td>掌握这3个技能，新手也能快速入门新媒体运营！</td>\n",
       "      <td>2020-01-05</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>897</th>\n",
       "      <td>离月薪2w的新媒体运营，你还差什么？</td>\n",
       "      <td>2019-11-03</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1114</th>\n",
       "      <td>6周掌握新媒体运营的3大核心能力，你Get到了吗？</td>\n",
       "      <td>2019-09-13</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1172</th>\n",
       "      <td>新媒体运营新人必须掌握的3大核心能力，你Get到了吗？</td>\n",
       "      <td>2019-09-01</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            title create_time  \\\n",
       "17        2020年，为什么建议你尝试做B端新媒体运营？  2020-05-13   \n",
       "98    线上课程 | 想做好新媒体运营，你需要先了解行业规则！  2020-04-27   \n",
       "265     别再到处收藏资料包啦！3步提升你的新媒体运营能力！  2020-03-23   \n",
       "347    新媒体运营新人，如何体系化入门快速完善初阶必备技能？  2020-03-07   \n",
       "376      B端新媒体运营思考：聊聊个人价值、标杆与内容输出  2020-03-01   \n",
       "605        掌握这3个技能，新手也能快速入门新媒体运营！  2020-01-05   \n",
       "897            离月薪2w的新媒体运营，你还差什么？  2019-11-03   \n",
       "1114    6周掌握新媒体运营的3大核心能力，你Get到了吗？  2019-09-13   \n",
       "1172  新媒体运营新人必须掌握的3大核心能力，你Get到了吗？  2019-09-01   \n",
       "\n",
       "                                                   link  \n",
       "17    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "98    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "265   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "347   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "376   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "605   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "897   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1114  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1172  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 163,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xmtyy = df_url_out[df_url_out.title.str.contains(\"新媒体运营\")]\n",
    "xmtyy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>以餐饮排队系统为例，拆解B端产品用户访谈</td>\n",
       "      <td>2020-05-14</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>如何优化迪士尼排队流程，提升用户体验？</td>\n",
       "      <td>2020-03-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>三个步骤，手把手教你做用户增长</td>\n",
       "      <td>2020-03-20</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>都行业级重大创新了，用户为啥不买账？</td>\n",
       "      <td>2020-03-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61</th>\n",
       "      <td>为什么我的 “产品卖点” 比别人多，用户还是买了别人的？</td>\n",
       "      <td>2020-05-04</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1320</th>\n",
       "      <td>Keep 的用户体验，有哪些特点</td>\n",
       "      <td>2019-07-30</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1326</th>\n",
       "      <td>为什么辛辛苦苦做的产品功能，用户却不喜欢？</td>\n",
       "      <td>2019-07-29</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1341</th>\n",
       "      <td>你的产品那么好，跟用户真的有关吗？</td>\n",
       "      <td>2019-07-25</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1355</th>\n",
       "      <td>为什么要设计一些“不好”的用户体验？</td>\n",
       "      <td>2019-07-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1357</th>\n",
       "      <td>都已经付费了，如何防止用户流失？</td>\n",
       "      <td>2019-07-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>74 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                             title create_time  \\\n",
       "14            以餐饮排队系统为例，拆解B端产品用户访谈  2020-05-14   \n",
       "34             如何优化迪士尼排队流程，提升用户体验？  2020-03-21   \n",
       "37                 三个步骤，手把手教你做用户增长  2020-03-20   \n",
       "44              都行业级重大创新了，用户为啥不买账？  2020-03-19   \n",
       "61    为什么我的 “产品卖点” 比别人多，用户还是买了别人的？  2020-05-04   \n",
       "...                            ...         ...   \n",
       "1320              Keep 的用户体验，有哪些特点  2019-07-30   \n",
       "1326         为什么辛辛苦苦做的产品功能，用户却不喜欢？  2019-07-29   \n",
       "1341             你的产品那么好，跟用户真的有关吗？  2019-07-25   \n",
       "1355            为什么要设计一些“不好”的用户体验？  2019-07-22   \n",
       "1357              都已经付费了，如何防止用户流失？  2019-07-22   \n",
       "\n",
       "                                                   link  \n",
       "14    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "34    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "37    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "44    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "61    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                 ...  \n",
       "1320  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1326  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1341  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1355  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1357  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[74 rows x 3 columns]"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yh = df_url_out[df_url_out.title.str.contains(\"用户\")]\n",
    "yh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>文案调动不了情绪，还算什么好文案？</td>\n",
       "      <td>2020-05-05</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>如何写出一份标准的活动文案？（附实例）</td>\n",
       "      <td>2020-04-24</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>有效提升文案能力，还能做文案副业的方法，在这里！</td>\n",
       "      <td>2020-04-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>137</th>\n",
       "      <td>3 连问，解决你不知道卖点文案怎么写的问题</td>\n",
       "      <td>2020-04-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>想提升文案能力？想做文案副业？500强文案内训师教你秘诀！</td>\n",
       "      <td>2020-04-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10789</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10793</th>\n",
       "      <td>文案写不出？没价值？运营人一定要收下这份文案技巧</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10812</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10816</th>\n",
       "      <td>文案写不出？没价值？运营人一定要收下这份文案技巧</td>\n",
       "      <td>2019-07-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10835</th>\n",
       "      <td>限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题</td>\n",
       "      <td>2019-07-16</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>858 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                               title create_time  \\\n",
       "58                 文案调动不了情绪，还算什么好文案？  2020-05-05   \n",
       "114              如何写出一份标准的活动文案？（附实例）  2020-04-24   \n",
       "121         有效提升文案能力，还能做文案副业的方法，在这里！  2020-04-22   \n",
       "137            3 连问，解决你不知道卖点文案怎么写的问题  2020-04-19   \n",
       "151    想提升文案能力？想做文案副业？500强文案内训师教你秘诀！  2020-04-16   \n",
       "...                              ...         ...   \n",
       "10789  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10793       文案写不出？没价值？运营人一定要收下这份文案技巧  2019-07-19   \n",
       "10812  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "10816       文案写不出？没价值？运营人一定要收下这份文案技巧  2019-07-19   \n",
       "10835  限额免费丨文案小白必学：3天摆脱写不出文案、文案逻辑乱问题  2019-07-16   \n",
       "\n",
       "                                                    link  \n",
       "58     http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "114    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "121    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "137    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "151    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "...                                                  ...  \n",
       "10789  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10793  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10812  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10816  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "10835  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "\n",
       "[858 rows x 3 columns]"
      ]
     },
     "execution_count": 166,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wn = df_url_out[df_url_out.title.str.contains(\"文案\")]\n",
    "wn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>如何优化迪士尼排队流程，提升用户体验？</td>\n",
       "      <td>2020-03-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>不能老拿「用户体验」说事儿</td>\n",
       "      <td>2020-05-01</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>278</th>\n",
       "      <td>如何优化迪士尼排队流程，提升用户体验？</td>\n",
       "      <td>2020-03-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>412</th>\n",
       "      <td>区块链产品，会对用户体验带来哪些影响？</td>\n",
       "      <td>2020-02-23</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>882</th>\n",
       "      <td>状态定义做得好，用户体验更进一步</td>\n",
       "      <td>2019-11-06</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>925</th>\n",
       "      <td>万字长文：刷新对用户体验设计的全新认知</td>\n",
       "      <td>2019-10-28</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1320</th>\n",
       "      <td>Keep 的用户体验，有哪些特点</td>\n",
       "      <td>2019-07-30</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1355</th>\n",
       "      <td>为什么要设计一些“不好”的用户体验？</td>\n",
       "      <td>2019-07-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    title create_time  \\\n",
       "34    如何优化迪士尼排队流程，提升用户体验？  2020-03-21   \n",
       "75          不能老拿「用户体验」说事儿  2020-05-01   \n",
       "278   如何优化迪士尼排队流程，提升用户体验？  2020-03-21   \n",
       "412   区块链产品，会对用户体验带来哪些影响？  2020-02-23   \n",
       "882      状态定义做得好，用户体验更进一步  2019-11-06   \n",
       "925   万字长文：刷新对用户体验设计的全新认知  2019-10-28   \n",
       "1320     Keep 的用户体验，有哪些特点  2019-07-30   \n",
       "1355   为什么要设计一些“不好”的用户体验？  2019-07-22   \n",
       "\n",
       "                                                   link  \n",
       "34    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "75    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "278   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "412   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "882   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "925   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1320  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1355  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yhty = df_url_out[df_url_out.title.str.contains(\"用户体验\")]\n",
    "yhty"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>create_time</th>\n",
       "      <th>link</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>公开课｜如何掌握一线大厂看中的数据分析能力？</td>\n",
       "      <td>2020-05-12</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>数据分析不是“做统计”，老板更关心如何驱动增长</td>\n",
       "      <td>2020-04-25</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>明晚直播｜如何掌握一线大厂看中的数据分析能力？</td>\n",
       "      <td>2020-04-13</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>互联网人必备的数据分析技能，你真的掌握了吗？</td>\n",
       "      <td>2020-04-08</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>产品、运营人必备的数据分析能力，戳此提升！</td>\n",
       "      <td>2020-03-31</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>340</th>\n",
       "      <td>数据分析，是一个越早掌握越对你有利的技能！</td>\n",
       "      <td>2020-03-08</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>416</th>\n",
       "      <td>公开课 | 0基础入门产品，1小时带你了解数据分析技能</td>\n",
       "      <td>2020-02-22</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>420</th>\n",
       "      <td>一个小故事告诉你：如何写好数据分析报告？</td>\n",
       "      <td>2020-02-21</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542</th>\n",
       "      <td>数据分析师拯救猪队友的操作指南.doc</td>\n",
       "      <td>2020-01-18</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>570</th>\n",
       "      <td>90%的人都做不好数据分析！被淘汰的会是你吗？</td>\n",
       "      <td>2020-01-12</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>626</th>\n",
       "      <td>为什么你做的数据分析没有用？因为你还没有掌握这套流程和方法！</td>\n",
       "      <td>2019-12-30</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>697</th>\n",
       "      <td>还在“假装”看数据？戳这里get一套实用的数据分析方法</td>\n",
       "      <td>2019-12-15</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>825</th>\n",
       "      <td>数据分析全凭“直觉”？戳这里get一套实用数据分析法</td>\n",
       "      <td>2019-11-19</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>86%的人面对数据只会做“统计”，数据分析如何避免沦为形式?</td>\n",
       "      <td>2019-11-09</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>905</th>\n",
       "      <td>谷歌母公司执行董事长：数据分析是年轻人都应该学习的职业技能！</td>\n",
       "      <td>2019-11-01</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>993</th>\n",
       "      <td>为什么说拥有数据思维，懂数据分析，更有核心竞争力？</td>\n",
       "      <td>2019-10-13</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1030</th>\n",
       "      <td>数据分析没有思路？15天帮你快速入门数据分析！</td>\n",
       "      <td>2019-10-03</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1109</th>\n",
       "      <td>只会数据“统计”不会分析? 15天快速掌握一套数据分析流程和方法</td>\n",
       "      <td>2019-09-14</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1155</th>\n",
       "      <td>有数据思维的人，是怎样做数据分析的?</td>\n",
       "      <td>2019-09-05</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1175</th>\n",
       "      <td>如何避免数据分析沦为形式? 15天，提升你的数据思维！</td>\n",
       "      <td>2019-08-31</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1232</th>\n",
       "      <td>深圳线下分享会丨如何用数据分析，驱动产品和用户增长？</td>\n",
       "      <td>2019-08-20</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1239</th>\n",
       "      <td>15天，从零到一快速入门数据分析</td>\n",
       "      <td>2019-08-18</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1307</th>\n",
       "      <td>从业5年，我总结了一套数据分析的实用方法论</td>\n",
       "      <td>2019-08-02</td>\n",
       "      <td>http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 title create_time  \\\n",
       "23              公开课｜如何掌握一线大厂看中的数据分析能力？  2020-05-12   \n",
       "108            数据分析不是“做统计”，老板更关心如何驱动增长  2020-04-25   \n",
       "168            明晚直播｜如何掌握一线大厂看中的数据分析能力？  2020-04-13   \n",
       "191             互联网人必备的数据分析技能，你真的掌握了吗？  2020-04-08   \n",
       "228              产品、运营人必备的数据分析能力，戳此提升！  2020-03-31   \n",
       "340              数据分析，是一个越早掌握越对你有利的技能！  2020-03-08   \n",
       "416        公开课 | 0基础入门产品，1小时带你了解数据分析技能  2020-02-22   \n",
       "420               一个小故事告诉你：如何写好数据分析报告？  2020-02-21   \n",
       "542                数据分析师拯救猪队友的操作指南.doc  2020-01-18   \n",
       "570            90%的人都做不好数据分析！被淘汰的会是你吗？  2020-01-12   \n",
       "626     为什么你做的数据分析没有用？因为你还没有掌握这套流程和方法！  2019-12-30   \n",
       "697        还在“假装”看数据？戳这里get一套实用的数据分析方法  2019-12-15   \n",
       "825         数据分析全凭“直觉”？戳这里get一套实用数据分析法  2019-11-19   \n",
       "867     86%的人面对数据只会做“统计”，数据分析如何避免沦为形式?  2019-11-09   \n",
       "905     谷歌母公司执行董事长：数据分析是年轻人都应该学习的职业技能！  2019-11-01   \n",
       "993          为什么说拥有数据思维，懂数据分析，更有核心竞争力？  2019-10-13   \n",
       "1030           数据分析没有思路？15天帮你快速入门数据分析！  2019-10-03   \n",
       "1109  只会数据“统计”不会分析? 15天快速掌握一套数据分析流程和方法  2019-09-14   \n",
       "1155                有数据思维的人，是怎样做数据分析的?  2019-09-05   \n",
       "1175       如何避免数据分析沦为形式? 15天，提升你的数据思维！  2019-08-31   \n",
       "1232        深圳线下分享会丨如何用数据分析，驱动产品和用户增长？  2019-08-20   \n",
       "1239                  15天，从零到一快速入门数据分析  2019-08-18   \n",
       "1307             从业5年，我总结了一套数据分析的实用方法论  2019-08-02   \n",
       "\n",
       "                                                   link  \n",
       "23    http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "108   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "168   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "191   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "228   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "340   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "416   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "420   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "542   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "570   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "626   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "697   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "825   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "867   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "905   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "993   http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1030  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1109  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1155  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1175  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1232  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1239  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  \n",
       "1307  http://mp.weixin.qq.com/s?__biz=MjM5OTEwNjI2MA...  "
      ]
     },
     "execution_count": 167,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sjfx = df_url_out[df_url_out.title.str.contains(\"数据分析\")]\n",
    "sjfx"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 输出Excel表格\n",
    "\n",
    "# 总\n",
    "df_url_out.to_excel(\"总_人人都是产品经理.xlsx\",\\\n",
    "                sheet_name=\"总_人人都是产品经理\")\n",
    "\n",
    "# 产品运营\n",
    "cpyy.to_excel(\"产品运营.xlsx\",\\\n",
    "                sheet_name=\"产品运营\")\n",
    "\n",
    "# 产品经理\n",
    "cpjl.to_excel(\"产品经理.xlsx\",\\\n",
    "                sheet_name=\"产品经理\")\n",
    "\n",
    "# 产品设计\n",
    "cpsj.to_excel(\"产品设计.xlsx\",\\\n",
    "                sheet_name=\"产品设计\")\n",
    "\n",
    "# 新媒体运营\n",
    "xmtyy.to_excel(\"新媒体运营.xlsx\",\\\n",
    "                sheet_name=\"新媒体运营\")\n",
    "\n",
    "# 用户\n",
    "yh.to_excel(\"用户.xlsx\",\\\n",
    "                sheet_name=\"用户\")\n",
    "\n",
    "# 用户体验\n",
    "yhty.to_excel(\"用户体验.xlsx\",\\\n",
    "                sheet_name=\"用户体验\")\n",
    "\n",
    "# 文案\n",
    "wn.to_excel(\"文案.xlsx\",\\\n",
    "                sheet_name=\"文案\")\n",
    "\n",
    "# 数据分析\n",
    "sjfx.to_excel(\"数据分析.xlsx\",\\\n",
    "                sheet_name=\"数据分析\")\n",
    "\n",
    "# # 无法分类\n",
    "# df_o.to_excel(\"无法分类.xlsx\",\\\n",
    "#                 sheet_name=\"无法分类\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
